Effective Subspace Clustering with Dimension Pairing in the Presence of High Levels of Noise

نویسندگان

  • Andrew Foss
  • Osmar R. Zaïane
چکیده

Attempts at clustering large and high dimensional data have been made with a focus on scalability. While still inefficient for more complex problems, the effectiveness is also questionable because data becomes very sparse in a high dimensional space. If clusters exist in the data, they tend to remain hidden in some unidentified sub-spaces. So far, the few solutions to this problem have not been able to handle high levels of noise and are often inefficient. Most clustering solutions also require fine-tuning of parameter settings, which are difficult or impossible to set in advance especially by a novice user. We propose a new method, MAXCLUS, which first identifies sub-spaces where clusters could be located then pinpoints the clusters in each sub-space. MAXCLUS has proved robust even with high levels of noise and is very efficient and accurate with little or no parameter adjustment on a wide range of problems outperforming existing approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic correlated sources direction finding in the presence of unknown spatial correlation noise

In this paper, a new method is proposed for DOA estimation of correlated acoustic signals, in the presence of unknown spatial correlation noise. By generating a matrix from the signal subspace with the Hankel-SVD method, the correlated resource information is extracted from each eigen-vector. Then a joint-diagonalization  structure is constructed of the signal subspace and basis it, independent...

متن کامل

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل

A Novel Noise Reduction Method Based on Subspace Division

This article presents a new subspace-based technique for reducing the noise of signals in time-series. In the proposed approach, the signal is initially represented as a data matrix. Then using Singular Value Decomposition (SVD), noisy data matrix is divided into signal subspace and noise subspace. In this subspace division, each derivative of the singular values with respect to rank order is u...

متن کامل

A Novel Noise Reduction Method Based on Subspace Division

This article presents a new subspace-based technique for reducing the noise of signals in time-series. In the proposed approach, the signal is initially represented as a data matrix. Then using Singular Value Decomposition (SVD), noisy data matrix is divided into signal subspace and noise subspace. In this subspace division, each derivative of the singular values with respect to rank order is u...

متن کامل

Isotropic Constant Dimension Subspace Codes

 In network code setting, a constant dimension code is a set of k-dimensional subspaces of F nq . If F_q n is a nondegenerated symlectic vector space with bilinear form f, an isotropic subspace U of F n q is a subspace that for all x, y ∈ U, f(x, y) = 0. We introduce isotropic subspace codes simply as a set of isotropic subspaces and show how the isotropic property use in decoding process, then...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006